Preference-Based Policy Learning

نویسندگان

Riad Akrour

Marc Schoenauer

Michèle Sebag

چکیده

Many machine learning approaches in robotics, based on reinforcement learning, inverse optimal control or direct policy learning, critically rely on robot simulators. This paper investigates a simulatorfree direct policy learning, called Preference-based Policy Learning (PPL). PPL iterates a four-step process: the robot demonstrates a candidate policy; the expert ranks this policy comparatively to other ones according to her preferences; these preferences are used to learn a policy return estimate; the robot uses the policy return estimate to build new candidate policies, and the process is iterated until the desired behavior is obtained. PPL requires a good representation of the policy search space be available, enabling one to learn accurate policy return estimates and limiting the human ranking effort needed to yield a good policy. Furthermore, this representation cannot use informed features (e.g., how far the robot is from any target) due to the simulator-free setting. As a second contribution, this paper proposes a representation based on the agnostic exploitation of the robotic log. The convergence of PPL is analytically studied and its experimental validation on two problems, involving a single robot in a maze and two interacting robots, is presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a “preference-based” approach to reinforcement learning is a possible extension of the type of feedback an agent may learn from. In particular, while conventional RL methods are essentially confined to deal with numeri...

متن کامل

Preference-based Reinforcement Learning

This paper investigates the problem of policy search based on the only expert’s preferences. Whereas reinforcement learning classically relies on a reward function, or exploits the expert’s demonstrations, preference-based policy learning (PPL) iteratively builds and optimizes a policy return estimate as follows: The learning agent demonstrates a few policies, is informed of the expert’s prefer...

متن کامل

Towards Preference-Based Reinforcement Learning

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs o...

متن کامل

Caching Policy for Cache-enabled D2D Communications by Learning User Preference

Prior works in designing caching policy do not distinguish content popularity with user preference. In this paper, we optimize caching policy for cache-enabled device-to-device (D2D) communications by exploiting individual user behavior in sending requests for contents. We first show the connection between content popularity and user preference. We then optimize the caching policy with the know...

متن کامل

APRIL: Active Preference Learning-Based Reinforcement Learning

This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent de...

متن کامل

Learning and Teaching Styles in the Focus: The Case of Iranian EFL Learners and Teachers

Underlying any learning and teaching process is a set of preferred Learning Styles (LSs) and Teaching Styles (TSs) which epitomize the overall educational policy and identification of which is sine qua non for any reform of the educational system. This ex-post-facto study scrutinized preference of Iranian EFL teachers' for Expert, Formal Authority, Personal Model, Facilitator, and Delegator TSs...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Preference-Based Policy Learning

نویسندگان

چکیده

منابع مشابه

Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning

Preference-based Reinforcement Learning

Towards Preference-Based Reinforcement Learning

Caching Policy for Cache-enabled D2D Communications by Learning User Preference

APRIL: Active Preference Learning-Based Reinforcement Learning

Learning and Teaching Styles in the Focus: The Case of Iranian EFL Learners and Teachers

عنوان ژورنال:

اشتراک گذاری